Papadimitriou et al. BMC Genomics 2014, 15:272 
http://www.bionnedcentral.conn/1 471 -21 64/1 5/272 



Genomics 



RESEARCH ARTICLE Open Access 



Comparative genomics of the dairy isolate 
Streptococcus macedonicus ACA-DC 198 
against related members of the Streptococcus 
bovisi Streptococcus equinus complex 

Konstantinos Papadimitriou^'', Rania Anastasiou\ Eleni Mavrogonatou^ Jochen Blom^ Nikos C Papandreou^, 
Stavros J Hamodrakas"^, Stephanie Ferreira^ Pierre Renault^'^, Philip Supply^'^'^'^°'^\ Bruno Pot^'^'^°'^^ 
and Effie Tsakalidou^ 



Abstract 

Background: Within the genus Streptococcus, only Streptococcus thermophilus is used as a starter culture in food 
fermentations. Streptococcus mocedonicus though, which belongs to the Streptococcus bov'is/ Streptococcus equinus 
complex (SBSEC), is also frequently isolated from fermented foods mainly of dairy origin. Members of the SBSEC 
have been implicated in human endocarditis and colon cancer. Here we compare the genome sequence of the 
dairy isolate 5. mocedonicus ACA-DC 198 to the other SBSEC genomes in order to assess in silico its potential 
adaptation to milk and its pathogenicity status. 

Results: Despite the fact that the SBSEC species were found tightly related based on whole genome phylogeny of 
streptococci, two distinct patterns of evolution were identified among them. Streptococcus mocedonicus, 
Streptococcus infontorius CJ18 and Streptococcus posteurionus ATCC 43144 seem to have undergone reductive 
evolution resulting in significantly diminished genome sizes and increased percentages of potential pseudogenes 
when compared to Streptococcus gollolyticus subsp. gollolyticus. In addition, the three species seem to have lost 
genes for catabolizing complex plant carbohydrates and for detoxifying toxic substances previously linked to the 
ability of 5. gollolyticus to survive in the rumen. Analysis of the S. mocedonicus genome revealed features that could 
support adaptation to milk, including an extra gene cluster for lactose and galactose metabolism, a proteolytic 
system for casein hydrolysis, auxotrophy for several vitamins, an increased ability to resist bacteriophages and 
horizontal gene transfer events with the dairy Loctococcus loctis and 5. thermophilus as potential donors. In addition, 
S. mocedonicus lacks several pathogenicity-related genes found in 5. gollolyticus. For example, 5. mocedonicus has 
retained only one (i.e. the pil3) of the three pilus gene clusters which may mediate the binding of S. gollolyticus to 
the extracellular matrix. Unexpectedly, similar findings were obtained not only for the dairy S. infontorius CJ18, 
but also for the blood isolate 5. posteurionus ATCC 43144. 
(Continued on next page) 
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(Continued from previous page) 

Conclusions: Our whole genome analyses suggest traits of adaptation of 5. mocedonicus to the nutrient-rich 
dairy environment. During this process the bacterium gained genes presumably important for this new ecological 
niche. Finally, 5. mocedonicus carries a reduced number of putative SBSEC virulence factors, which suggests a 
diminished pathogenic potential. 

Keywords: Streptococcus, Genome, Adaptation, Gene decay, Pseudogene, Horizontal gene transfer. Pathogenicity, 
Virulence factor. Milk, Niche 



Background 

Lactic acid bacteria (LAB) constitute a very important 
group of microorganisms for the food industry, as well 
as the health of humans and animals [1,2], Several species 
in this group have a long history of safe use in fermented 
foods and thus belong to the very few bacteria that may 
qualify for the "generally regarded as safe" (GRAS) or the 
"qualified presumption of safety" (QPS) status according 
to FDA and EFSA, respectively [3]. Other LAB species are 
commensals of the skin, the oral cavity, the respiratory 
system, the gastrointestinal tract (GIT) and the genitals of 
mammals or other organisms. Furthermore, the presence 
of specific LAB strains, called "probiotics", in certain 
niches of the body is considered to promote the health 
of the host [2], This benign nature of LAB, as well as their 
economic value, often obscure the existence of notorious 
LAB pathogens that are among the leading causes of 
human morbidity and mortality worldwide [4] , 

This oxymoron about the vast differences in the patho- 
genic potential within the LAB group is probably best 
exemplified by streptococci. The genus basically consists 
of commensals that include several severe pathogens, like 
group A streptococci (GAS), group B streptococci (GBS) 
and Streptococcus pneumoniae [5]. Streptococcal pathogens 
are implicated in a plethora of diseases, ranging from 
mild (e.g. pharyngitis) to invasive and life-threatening 
(e.g. necrotizing fasciitis) infections [6]. In contrast. 
Streptococcus thermophilus is one of the most frequent 
starter LAB consumed by humans in yogurt and cheese 
[7]. It is believed that this is the only streptococcal species 
that, during its adaptation to the nutrient-rich milk 
environment, underwent extensive genome decay, result- 
ing in the loss of pathogenicity- related genes present in 
members of the genus [7,8]. 

Apart from S, thermophilus, other streptococci can grow 
in milk and milk products. Such streptococci mainly belong 
to the Streptococcus bovis I Streptococcus equinus complex 
(SBSEC) [9]. The exact route that would explain their 
presence in milk is yet unidentified. In theory, since 
some of them can naturally occur in the GIT or on the 
teat skin of lactating animals, they could be passively 
transmitted to raw milk. In addition, species of the 
SBSEC are known to be involved in human cases of endo- 
carditis, meningitis, bacteremia and colon cancer [10-12]. 



However, Streptococcus macedonicus, which is a member 
of this specific complex, has been suggested to be adapted 
to milk and it has been hypothesized that it could be non 
pathogenic. These assumptions were based on the fact that 
the primary ecological niche of S. macedonicus appears 
to be naturally fermented foods, mostly of dairy origin 
similarly to S. thermophilus [13]. Initial in vitro and in vivo 
evaluation did not support virulence of S. macedonicus 
ACA-DC 198 [14]. PGR and Southern blotting analyses 
indicated the absence of several Streptococcus pyogenes 
pathogenicity genes. In addition, oral administration of 
the organism at high dosages (8.9 log cfu daily) for an 
extended period of time (12 weeks) to mice did not result 
in any observable adverse effects including inflammation 
in the stomach or translocation from the GIT to the or- 
gans of the animals [14]. Moreover, strains of S, macedoni- 
cus have been shown to present important technological 
properties of industrial cultures like the production 
of texturizing exopolysaccharides and anti-clostridial 
bacteriocins [13]. 

Streptococcus macedonicus was originally isolated from 
traditional Greek Kasseri cheese [15] and it is phylogen- 
etically related to Streptococcus gallolyticus subsp. gallo- 
lyticus and Streptococcus pasteurianus (formerly known 
as S, bovis biotypes I and II.2, respectively), as well as to 
Streptococcus infantarius (formerly known as S, bovis bio- 
type II.l). The inclusion of S, macedonicus and S, pasteuria- 
nus as subspecies of S, gallolyticus subsp. gallolyticus (from 
this point on S, gallolyticus) has been previously suggested 
[16], but this taxonomic reappraisal has not been formally 
accepted so far [17]. Streptococcus gallolyticus and S, 
pasteurianus are considered pathogenic. Preliminary 
investigations concerning the mechanisms by which S, 
gallolyticus causes endocarditis indicated that S, macedo- 
nicus may lack at least some of the pathogenic determi- 
nants implicated in this disease [18,19]. Furthermore, the 
recent study of the genome of S, infantarius subsp. infan- 
tarius CJ18 (from this point on S, infantarius) isolated 
from spontaneously fermented camel milk in Africa has 
indicated strain-dependent traits of adaptation to the dairy 
environment despite the fact that the species is consid- 
ered as a putative pathogen [20]. Overall, the presence 
in fermented foods of SBSEC species with a currently 
unresolved pathogenicity status, such as S, macedonicus 
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and S. infantarius, may represent an underestimated cause 
of concern in terms of food safety and public health, 
which needs to be addressed. 

Here we present the first complete genome sequence 
of S, macedonicus in order to shed light on the biology 
of the species. We are particularly interested in assessing 
niche adaptation and in investigating the pathogenic 
potential of the strain analyzed based on comparative 
genomics against other complete genomes within the 
SBSEC. This is an important step to rationally deduce 
whether the bacterium is safe to be used as a starter or 
if extra technological measures are needed to avoid its 
presence in food fermentations. 

Results and discussion 

General features of Streptococcus macedonicus ACA-DC 
198 genome 

The circular chromosome of S, macedonicus ACA-DC 198 
consists of 2,130,034 bp (Figure 1) with a G + C content 
of 37.6%, which is among the lowest values within the 
available complete streptococcal genomes (39.3% ± 1.7%, 
n = 95 by May 2013). A total of 2,192 protein coding DNA 
sequences (CDSs) were annotated, covering 87.3% of the 
S, macedonicus chromosome. Of these, 192 were identified 
as putative pseudogenes according to GenePRIMP [21] 
analysis followed by manual curation. The bacterium also 



carries 18 rRNA genes organized in 5 clusters co-localized 
with most of the 70 tRNA genes. The S, macedonicus 
genome was found to be 220-232 kb smaller and only 
30 kb larger than the genomes of S, gallolyticus and S, 
pasteurianus, respectively. Streptococcus infantarius has 
one of the smallest genome sizes within the SBSEC re- 
ported up to now (i.e. 141 kb smaller than that of S, 
macedonicus). The percentage of potential pseudogenes 
in S, macedonicus was 8.7%, in S, pasteurianus 7.7% and 
in S, infantarius 4.9%. In contrast, the percentage of pseu- 
dogenes in at least two S, gallolyticus strains (i.e. strains 
UCN34 and ATCC 43143) has been found to be 2.1% or 
less. This analysis is in accordance with previous findings 
[9,22] . Based on the close phylogenetic relationship among 
the four species, these observations suggest that the gen- 
ome of S, macedonicus, as well as those of S. pasteurianus 
and S. infantarius may be evolving under selective 
pressures that allow gene loss events and genome decay 
processes when compared to the S. gallolyticus genomes. 

Whole genome phylogeny, comparative genomics, and core 
genome analysis 

A phylogenetic tree based on the currently available 
complete streptococcal genome sequences was constructed 
using the EDGAR software [23]. On this tree, S. gallolyticus, 
S. macedonicus, S. pasteurianus, as well as S, infantarius 




10SOOOO 



Figure 1 The circular map of the genome of Streptococcus macedonicus ACA-DC 198. Genomic features appearing from tine peripliery to 
tlie centre of tine map: 1. Forward CDSs (red); 2. Reverse CDSs (blue); 3. Putative pseudogenes (cyan); 4. rRNA genes (orange); 5. tRNA genes 
(green); 6.% GC plot; 7. GC skew. 
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formed a single, monophyletic branch, providing strong 
evidence for the taxonomic integrity of the SBSEC 
(Additional file 1: Figure SI). 

Subsequently, full chromosome alignments were per- 
formed using progressiveMAUVE [24]. The analysis 
revealed a mosaic pattern of homology organized in local 
collinear blocks (LCBs) among S. gallolyticus, S, macedoni- 
cus and S, pasteurianus (Figure 2A). Evidently, a significant 
portion of the genetic information has been overall con- 
served, as the majority of the LCBs are shared by all species. 
In addition, chromosomal rearrangements seem to have 
been rather minimal, as the number of LCBs showing a 
change in relative genomic position among the strains 
was low and their length short. Nevertheless, numerous 
differences were also detected. Some LCBs were common 
only among some of the strains, while some regions were 
identified as strain-specific (and hence not included within 
an LCB). The presence of such strain- specific regions sug- 
gests that, in addition to gene loss mentioned earlier, gene 
acquisition events mediated by horizontal gene transfer 
(HGT) may have played a role during the evolution of the 
three species (see below). Interestingly, the inclusion of 
the S, infantarius genome in the MAUVE analysis resulted 
in an increased number of LCBs with a decreased average 
length. As the level of sequence conservation of individual 
LCBs among the four species remains relatively high, 
this observation suggests that specific genome structure 
reorganization events occurred specifically in S, infantarius 
(Figure 2B). Analysis with the EDGAR software revealed a 
core genome of only 1,372 orthologous genes based on 
the sequence and the current annotation of S, gallolyticus, 
S. pasteurianus and S, macedonicus (Figure 3A, Additional 
file 2: Table SI) [23]. Once more, inclusion of S, infantar- 
ius increased the diversity, resulting in reduction of the 
core genome by more than 100 genes among the four 
species (Figure 3B, Additional file 3: Table S2). The sig- 
nificant percentage of variable genes within the four 
SBSEC species may underpin their adaptation to specific 
environments. 

Genes involved in the survival in the GIT 

It has been established that S, gallolyticus displays the 
notable ability to accumulate and metabolize a broad range 
of complex carbohydrates from plants when compared to 
other streptococci [25]. The necessity for this repertoire of 
carbohydrate-degrading activities has been considered to 
reflect the adaptation of S, gallolyticus to the rumen of 
herbivores [22,25]. Preliminary analysis indicated that at 
least some of the relevant genes are either entirely absent 
or they have been converted into pseudogenes in the 
genomes of S, macedonicus, S. pasteurianus and S. infan- 
tarius (Table 1). The presence of pseudogenes related 
to carbohydrate metabolism reinforces the notion that 
S, macedonicus, S. pasteurianus and S. infantarius have 



undergone genome decay processes during adaptation to 
their ecological niches. The entire glycobiome of the SBSEC 
members was further analyzed based on the data available 
in the CAZy database (Additional file 4: Table S3) [26]. 
Important differences in the distribution of enzymes among 
the SBSEC members were observed for all CAZy categories 
including glycoside hydrolases (GHs), glycosyl tran- 
ferases, polysaccharide lyases, carbohydrate esterases and 
carbohydrate-binding modules (CBMs). Streptococcus 
macedonicus and Streptococcus infantarius had the smal- 
lest glycobiome within the SBSEC. The two strains had 
only 24 and 23 GHs, while the rest SBSEC members had 
more than 40. Given that most of these GHs are poten- 
tially involved in plant and dietary carbohydrate catabol- 
ism (e.g. GHl, GH3, GH13, GH36 etc.) [27], it could be 
hypothesized that S. macedonicus and S. infantarius have 
a diminished necessity for such enzymes in their eco- 
logical niche. Streptococcus pasteurianus had the highest 
number of GHs, some of which were unique among 
SBSEC (i.e. GH35, GH78, GH79, GH85, GH92, GH125). 
This observation indicates differences in the range of car- 
bohydrates the strain is able to catabolize in comparison 
to the other members of the complex. Interestingly, none 
of the SBSEC members were found to carry GHs that are 
implicated in the degradation of host derived oligosaccha- 
rides (e.g. GH33 and GH98) [27]. In contrast. Streptococ- 
cus gallolyticus strains, S. macedonicus and S. infantarius 
had hits in the CBM41 family, while S. pasteurianus in the 
CBM32 family, both of which have been associated with 
the recognition of host glycans [27,28]. 

Furthermore, S. gallolyticus can detoxify toxic com- 
pounds met in the rumen and other environments. 
Again, S. macedonicus, S, pasteurianus and S, infantarius 
miss some of the genes involved in detoxification (Table 1). 
None of them carry genes for tannin hydrolysis similar to 
GALLO_0933 or GALLO_1609. The potential to degrade 
additional phenolic compounds like gallic acid seems to 
be comparable between S, gallolyticus and S, pasteurianus. 
In contrast, S, infantarius has no orthologs of either PadC 
(GALLO_2106) or GALLO_0906, i.e. the two gallic acid 
decarboxylases found in S, gallolyticus UCN34, while S, 
macedonicus has retained only PadC. Furthermore, the 
bsh gene (GALLO_0818), coding for a bile salt hydrolase, 
is present in all four species with the exception of S, mace- 
donicus, in which it appears as a pseudogene. Thus, our 
findings clearly suggest that not only S, macedonicus, but 
also S, pasteurianus and S, infantarius have deviated from 
S, gallolyticus in their potential to cope with the harsh en- 
vironment of the GIT of herbivores. 

Genes involved in the growth in milk or dairy products 

Dairy LAB are considered fastidious microorganisms due 
to their adaptation to growth in milk that is particularly 
nutritious by nature. Lactose and milk proteins (both 
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Figure 2 Chromosome alignments of tiie Streptococcus bovis I Streptococcus equinus complex members as calculated by 
progressiveMauve. Chromosome alignments among Streptococcus gallolyticus, Streptococcus macedonicus and Streptococcus pasteurianus 
(A) and all the aforementioned streptococci and Streptococcus infontorius (B). Local collinear blocks (LCBs) of conserved sequences among 
the strains are represented by rectangles of the same colour. Connecting lines can be used to visualize synteny or rearrangement. LCBs 
positioned above or under the chromosome (black line) correspond to the forward and reverse orientation, respectively. The level of 
conservation is equivalent to the level of vertical colour filling within the LCBs (e.g. white regions are strain-specific). Sequences not placed 
within an LCB are unique for the particular strain. 
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Figure 3 Core genome analysis of members of tlie Streptococcus bovis/ Streptococcus equinus complex. Whole CDS Venn diagrams of 
Streptococcus gollolyticus, Streptococcus mocedonicus and Streptococcus posteurionus (A) or Streptococcus gollolyticus, Streptococcus infontorius, 
Streptococcus macedonicus and Streptococcus pasteurianus (B). In (B) Streptococcus gollolyticus ATCC 43143 was selected as a representative of the 
5. gollolyticus species, since it has the longest genome size among the three sequenced strains. 



caseins and whey proteins) are characteristic of the 
dairy environment. LAB are able to ferment lactose to 
lactic acid and they have evolved a proteolytic system 
for the degradation of milk proteins down to amino 
acids [1,29]. 

All SBSEC species are able to utilize lactose and to 
catabolize galactose. Sequence similarity searches revealed 
a gene cluster (SMA_0197 - SMA_0211) dedicated to lac- 
tose metabolism with a unique organization in SBSEC 
when compared to those previously reported for other 
LAB (Table 2). The typical sequence of lac genes is inter- 
rupted in the majority of SBSEC strains by genes coding 
for the IIA, IIB and IIC components of a PEP-PTS 
(SMA_0202 - SMA _0204). Annotation of this PEP-PTS 
varies among the SBSEC species/strains and for this rea- 
son functional analysis is required to properly determine 
its exact function. In contrast to other SBSEC species, 
these three PTS genes are absent from S. infantarius. The 
lactose-specific PTS found at the end of the lac gene cluster 
(SMA_0206 - SMA _0210) is also inactivated in S. infan- 
tarius through disruption of the lacT antiterminator gene 
by transposases [20]. Interestingly, the lac gene cluster in S, 
macedonicus contains two 6-phospho-beta-galactosidase 
{lacG) genes that may be indicative of adaptation of this 
particular species to milk. Galactose can also be catabolized 
through the Leloir pathway and a galRKTE operon coding 
for the relevant enzymes was previously determined in S, 



infantarius [30]. The gal operon is conserved in all SBSEC 
species analyzed here (Table 2). 

A partial gal-lac operon galT{tv\xncdited)lgalElMllacSZ 
with high sequence identity to S, thermophilus is also 
present in the genome of S, infantarius [30]. It has been 
demonstrated that the lactose and galactose permease 
{lacS) and the |3-galactosidase {lacZ) are responsible for 
the uptake and initial hydrolysis of lactose in S, infantarius 
in a manner similar to that employed by S, thermophilus 
[20]. This gal-lac operon of S, infantarius is missing 
from the other SBSEC strains as a whole. A LacZ ortholog 
(SGPB_0344) is only present in S, pasteurianus and 
dispersed galE and galM genes can be found in the S, 
gallolyticus and S, pasteurianus genomes. Similarly to 
the presence of the extra gal-lac operon in S, infantarius, 
we detected a second lac gene cluster in S, macedonicus 
(SMA_1156 - SMA_1165), also suggesting adaptation to 
the milk environment. This second gene cluster is solely 
present in S, macedonicus and not in any other SBSEC 
member. Surprisingly, an additional lacTFEG region coding 
for a complete lactose PEP-PTS and a 6-phospho-beta-ga- 
lactosidase is present in the genomes of S, gallolyticus and 
S, pasteurianus. This is an unexpected finding since S, 
gallolyticus and S. pasteurianus have hardly ever been 
related to milk up to now [9]. 

We then investigated the proteolytic system of S, mace- 
donicus and the rest of the SBSEC members adapting the 



Table 1 Genes in the Streptococcus bovis I Streptococcus equinus complex potentially involved in adaptation to the rumen 



Function Gene S. gallolyticus S. gallolyticus S. gallolyticus S. pasteurianus S. macedonicus S. infantarius 

UCN34 ATCC 43143 ATCC BAA-2069 ATCC 43144 ACA-DC 198 CJ18 



Pullulanase 


- (a) 


GALLO_1462 


SGGB_1458 


SGGBAA2069_cl4850 


SGPB_1362 (t) 


SMA_1464 (s) 


Sinf_1270 














SMA_1465 (s) 




Pullulanase 




GALLO_0781 


SGGB_0764 


SGGBAA2069_c07530 




SMA_0719 (p) 
















SMA_0720 (r) 
















SMA_0721 (p) 




a-amylase, neopullulanase 




GALLO_0753 


SGGB_0736 


SGGBAA2069_c07260 








Fructan hydrolase 


fruA 


GALLO_0112 


SGGB_0110 


SGGBAA2069_c01280 








Beta-l,4-endoglucanase V (cellulase) 




GALLO_0330 


SGGB_0358 


SGGBAA2069_c03180 








CinnamoyI ester hydrolase 


cinA 


GALLO_0140 


SGGB_0137 


SGGBAA2069_c01580 








Mannanase 




GALLO_0162 


SGGB_0206 


SGGBAA2069_c01800 






Sinf_0174 (p 


Endo-beta-1 ,4-galactanase 




GALLO_0189 


SGGB_0233 


SGGBAA2069_c02070 


SGPB_0176 


SMA_0214 (p) 


Sinf_0197 (p 


Pectate lyase 




GALLO_1577 


SGGB_1576 


SGGBAA2069_cl6050 






Sinf_1418 


Pectate lyase 




GALLO_1578 


SGGB_1577 


SGGBAA2069_cl6060 


SGPB_1461 (p) 


SMA_1582 (p) 
















SMA_1583 (s) 
















SMA_1584 (s) 




Malate transporter 


mleP 


GALLO_2048 


SGGB_2031 


SGGBAA2069_c20060 


SGPB_1855 


SMA_1 945 


Sinf_1750 


Malate dehydrogenase 


mIeS 


GALLO_2049 


SGGB_2032 


SGGBAA2069_c20070 


SGPB_1856 


SMA_1946 


Sinf_1751 


PTS system, mannitol-specific IIBC component 


mtIA 


GALLO_0993 


SGGB_0982 


SGGBAA2069_c09680 




SMA_0905 (p) 




Mannitol operon transcriptional antiterminator 


mtIR 


GALLO_0994 


SGGB_0983 


SGGBAA2069_c09690 




SMA_0906 (p) 
















SMA_0907 
















SMA_0908 
















SMA_0909 
















SMA_0910 
















SMA_091 1 
















SMA_0912 
















SMA 0913 
















SMA_0914 
















SMA_0915 
















ShAA_09]6 
















SMA_0917 




PTS system, mannitol-specific IIA component 


mtIF 


GALLO_0995 


SGGB_0984 


SGGBAA2069_c09700 








Mannitol-1 -phosphate 5-dehydrogenase 


mtID 


GALLO_0996 


SGGB_0985 


SGGBAA2069_c09710 









Table 1 Genes in the Streptococcus bovis I Streptococcus equinus complex potentially Involved in adaptation to the rumen (Continued) 



a-amylase 


- 


GALLO_ 


_0757 


SGGB_0740 


SGGBAA2069_c07300 


- 


- 


- 


a-amylase 


omyE 


GALLO_ 


J 632 


SGGB_1646 


SGGBAA2069_c 16600 


SGPB_1505 (p) 


SMA_1612 (t) 


Sinf_1443 


a-amylase 


- 


GALLO_ 


J 043 


SGGB_1033 


SGGBAA2069_c 10200 


SGPB_0905 


SMA_0972 


Sinf_0846 


tannase 


tanA 


GALLO_ 


_0933 


SGGB_0917 


SGGBAA2069_c09070 (s) 
SGGBAA2069_c09080 (s) 


- 


- 


- 


Tannase (similar to tanA) 




GALLO_ 


J 609 


SGGB_1624 


SGGBAA2069_cl6370 








Phenolic acid decarboxylase 


padC 


GALLO_ 


_2106 


SGGB_2089 


SGGBAA2069_c21040 


SGPB_1899 


SMA_2074 




Carboxymuconolactone decarboxylase 




GALLO_ 


_0906 


SGGB_0891 


SGGBAA2069_c08850 


SGPB_0775 






Bile salt hydrolase 


bsh 


GALLO_ 


CO 


SGGB_0803 


SGGBAA2069_c07920 


SGPB_0678 


SMA_0753 (p) 


Sinf_0639 



(a) Not found; (t) Truncated; (s) Split CDSs corresponding to fragments of the original gene not yet characterized as pseudogenes; (p) Pseudogenes; (r) Transposase genes In Italics. 



Table 2 Genes in the Streptococcus bovis/ Streptococcus equinus complex potentially involved in lactose and galactose metabolism 



Function 


Gene 


S. gallolyticus 
UCN34 


S. gallolyticus 
ATCC 43143 


S. gallolyticus 
ATCC BAA-2069 


S. pasteurianus 
ATCC 43144 


S. macedonicus 
ACA-DC 198 


S. infantarius 
CJ18 


Lactose-specific PTS system repressor 


lacR 


GALLO_0176 


SGGB_0220 


SGGBAA2069_c01940 


SGPB_0163 


SMA_0197 


Sinf_0181 


Galactose-6-pliospliate isomerase, LacA subunit 


lacA 


GALLO_0177 


SGGB_0221 


SGGBAA2069_c01950 


SGPB_0164 


SMA_0198 


Sinf_0182 


Galactose-6-pliospliate isomerase, LacB subunit 


lacBl 


GALLO_0178 


SGGB_0222 


SGGBAA2069_c01960 


SGPB_0165 


SMA_0199 


Sinf_0183 


Tagatose-6-pliospliate l<inase 


lacC 


GALLO_0179 


SGGB_0223 


SGGBAA2069_c01970 


SGPB_0166 


SMA_0200 


Sinf_0184 


Tagatose l,6-dipliospliate aldolase 


locD2 


GALLO_0180 


SGGB_0224 


SGGBAA2069_c01980 


SGPB_0167 


SMA_0201 


Sinf_0185 


Putative PTS system, IIA component 


-(a) 


GALLO_0181 


SGGB_0225 


SGGBAA2069_c01990 


SGPB_0168 


SMA_0202 




Putative PTS system, IIB component 




GALLO_0182 


SGGB_0226 


SGGBAA2069_c02000 


SGPB_0169 


SMA_0203 




Putative PTS system, IIC component 




GALLO_0183 


SGGB_0227 


SGGBAA2069_c02010 


SGPB_0170 


SMA_0204 




Aldose 1-epimerase 


locX 


GALLO_0184 


SGGB_0228 


SGGBAA2069_c02020 


SGPB_0171 


SMA_0205 


Sinf_0186 


Transcriptional antiterminator 


lacT 


GALLO_0185 


SGGB_0229 


SGGBAA2069_c02030 


SGPB_0172 


SMA_0206 


Sinf_0187 (p) 
Sin to 188 (r) 
Sinf_0189 
Sinf_0190 (p) 


6-phospho-beta-galactosidase 


locG 


GALLO_0186 


SGGB_0230 


SGGBAA2069_c02040 


SGPB_0173 


SMA_0207 




Transcriptional antiterminator 


locT 










SMA_0208 (p) 




Lactose-specific PTS system, IIA component 


lacF 


GALLO_0187 


SGGB_0231 


SGGBAA2069_c02050 


SGPB_0174 


SMA_0209 


Sinf_0191 


Lactose-specific PTS system, IIBC component 


lacE 


GALLO_0188 


SGGB_0232 


SGGBAA2069_c02060 


SGPB_0175 


SMA_0210 


Sinf_0192 


6-phospho-beta-galactosidase 


locG2 


- 


- 


- 


- 


SMA_0211 


Sinf_0193 (p) 

Sinf_0194 
Sinf_0195 (p) 


Galactose repressor 


gaIR 


GALLO_0197 


SGGB_0241 


SGGBAA2069_c02150 


SGPB_0184 


SMA_0222 


Sinf_0205 


Galactokinase 


galK 


GALLO_0198 


SGGB_0242 


SGGBAA2069_c02160 


SGPB_0185 


SMA_0223 


Sinf_0206 


Galactose-l-P-uridyl transferase 


golT 


GALLO_0199 


SGGB_0243 


SGGBAA2069_c02170 


SGPB_0186 


SMA_0224 


Sinf_0207 


UDP-glucose 4-epimerase 


golE 


GALLO_0200 


SGGB_0244 


SGGBAA2069_c02180 


SGPB_0187 


SMA_0225 


Sinf_0208 


Beta-galactosidase 


lacZ 


- 


- 


- 


SGPB_0344 


- 


- 


Glucokinase 


gIcK 


GALLO_0594 


SGGB_0562 


SGGBAA2069_c05300 


SGPB_0467 


SMA_0546 


Sinf_0470 


Beta-galactosidase 


locZ 


- 


- 


- 


- 


- 


Sinf_0935 


Lactose and galactose permease 


Iocs 












Sinf_0936 


Aldose 1-epimerase 


golM 












Sinf_0937 


UDP-glucose 4-epimerase 


golEl 












Sinf_0938 


Galactose-l-P-uridyl transferase 


galT 












Sinf_0939 (p) 


UDP-glucose 4-epimerase 


Iocs 












Sinf_1514 



Table 2 Genes in the Streptococcus bovis I Streptococcus equinus complex potentially involved in lactose and galactose metabolism (Continued) 



Aldose 1-epimerase 


locX 


- 


- 


- 


- 


SMA_ 


156 


6-phospho-beta-galactosidase 


lacG2 


- 


- 


- 


- 


SMA_ 


157 


Lactose-specific PTS system, IIBC component 


lacE 


- 


- 


- 


- 


SMA_ 


158 


Lactose-specific PTS system, IIA component 


lacF 


- 


- 


- 


- 


SMA_ 


159 


Tagatose l,6-dipliospliate aldolase 


locD 


- 


- 


- 


- 


SMA_ 


160 


Tagatose-6-phosphate kinase 


lacC 


- 


- 


- 


- 


SMA_ 


161 


Galactose-6-phosphate isomerase, LacB subunit 


lacB 


- 


- 


- 


- 


SMA_ 


162 


Galactose-6-phosphate isomerase, LacA subunit 


locAl 


- 


- 


- 


- 


SMA_ 


163 


Glucokinase 


gIcK 


- 


- 


- 


- 


SMA_ 


164 


Lactose phosphotransferase system repressor 


lacR 


- 


- 


- 


- 


SMA_ 


165 


Transcription antiterminator 


locT 


GALLO_1046 


SGGB_1036 


SGGBAA2069_c 10230 


SGPB_0907 


- 




Lactose-specific PTS system, IIA component 


locF 


GALLO_1047 


SGGB_1037 


SGGBAA2069_c 10240 


SGPB_0908 






Lactose-specific PTS system, IIBC component 


lacE 


GALLO_1048 


SGGB_1038 


SGGBAA2069_cl0250 


SGPB_0909 






Phospho-beta-galactosidase 


lacG 


GALLO_1049 


SGGB_1039 


SGGBAA2069_c 10260 


SGPB_0910 






Aldose 1-epimerase 


galM 


GALLO_0137 


SGGB_0134 


SGGBAA2069_c01550 


SGPB_0130 






UDP-glucose 4-epimerase 


galEl 


GALLO_0728 


SGGB_0709 


SGGBAA2069_c06910 


SGPB_0601 







(a) Not found; (p) Pseudogenes; (r) Transposase genes in italics. 
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scheme previously described by Liu and co-workers (i.e. 
excluding housekeeping proteases or proteases involved in 
specific cellular processes other than the acquisition of 
amino acids) [29]. In milk, casein utilization by LAB is 
initiated after hydrolysis by a cell-envelope associated 
proteinase (CEP) releasing oligopeptides. The oligopeptides 
are then transferred intracellularly via specialized peptide 
transport systems where they are systematically degraded 
into amino acids by an array of intracellular peptidases. 
The four species have essentially the same proteolytic 
system, albeit showing some differences (Table 3). None 
of them has a typical PrtP CEP, but S. gallolyticus and S, 
infantarius carry a lactocepin coding gene. The lactocepin 
of the SBSEC shows > 63% sequence similarity to the PrtS 
CEP involved in the degradation of milk proteins in S, 
thermophilus [31,32]. The exact role of lactocepin in 
SBSEC species needs to be experimentally examined. 
SBSEC strains like S, macedonicus may require CEP activ- 
ity to be provided by other bacteria when growing in milk. 
This is a common strategy of nonstarter LAB that rely on 
starter CEP-producing strains for casein hydrolysis [33]. 
Streptococcus infantarius carries two oligopeptide trans- 
port systems (Opp) [20], but all the other SBSEC species 
have only one such system. All SBSEC strains own a pro- 
ton motive force (PMF)-driven DtpT transporter for the 
transport of di- and tri-peptides and they all possess an 
entire repertoire of proteolytic enzymes including en- 
dopeptidases, general aminopeptidases and specialized 
peptidases (Table 3). They only lack enzymes of the 
PepE/PepG (endopeptidases) and the PepI/PepR/PepL 
(proline peptidases) superfamilies in accordance to previ- 
ous observations for streptococci and lactococci [29]. The 
conservation of this proteolytic system among strepto- 
cocci in the SBSEC despite their presumed adaptation to 
different ecological niches [20,22,25] indicates that it may 
somehow be essential. Furthermore, S, macedonicus and 
the other SBSEC members are autotrophs for several 
amino acids (data not shown) and only S, pasteurianus 
has been reported to be unable to synthesize tryptophan 
[22]. Thus, the preservation of an entire proteolytic system 
by SBSEC members while retaining the ability to synthesize 
most, if not all, amino acids is puzzling, especially when 
considering that some of them have obviously undergone 
extensive genome decay processes. It could be hypothesized 
that this property of SBSEC species may provide a com- 
petitive advantage in poor environments, but this needs to 
be further investigated. 

Apart from amino acids, S, gallolyticus UCN34 also 
carries complete pathways for the synthesis of a number 
of vitamins including riboflavin, nicotine amide, panto- 
thenate, pyridoxine, and folic acid, while the biosynthetic 
pathways for biotin and thiamine are partial [25]. The 
genes potentially involved in the de novo biosynthesis of 
pyridoxine in the SBSEC strains were determined based 



on the respective pathway of S. pneumoniae D39 [34]. The 
corresponding loci are conserved among S, gallolyticus 
strains but once more S, macedonicus, S. pasteurianus and 
S. infantarius appear to have undergone a heterogeneous 
gene loss process, indicating the necessity for exogenous 
supply of some of these vitamins (Table 4). For example, S. 
macedonicus misses the bioBDY, panBCD and ribDEAH 
loci involved in the biosynthesis of biotin, pantothenate and 
riboflavin, respectively. In addition, the presence of pseudo- 
genes or truncated/split genes may have disrupted the 
biosynthesis of pyridoxine, nicotine amide and thiamine 
through the routes analyzed here. It is not uncommon for 
LAB to be auxotrophic for several vitamins [35], though 
milk and other dairy products may contain all essential 
vitamins to sustain the growth of these microorgansims. 

Genomic islands (GIs) and unique genes of Streptococcus 
macedonicus 

GIs are sites of HGT that can uncover important features 
of the plasticity of a bacterial genome and they are primar- 
ily linked to gene gain processes. We used the IslandViewer 
application [36] to identify GIs of the SBSEC members in 
parallel. Streptococcus macedonicus had 14 predicted GIs 
with an average length of 18,109 bp corresponding to a 
total sequence of 253,523 bp or 11.9% the size of the 
bacterium's genome (Additional file 5: Figure S2). This 
percentage of externally acquired DNA is higher compared 
to the other SBSEC members, in which it ranged from 
8.8% in S gallolyticus ATCC BAA-2069 down to 5.9% in 
S. gallolyticus UCN34. 

As could be expected, the highest degree of sequence 
conservation among GIs was observed in the S, gallolyticus 
strains (e.g. S, gallolyticus UCN34 GIs 2, 6, 7, 8 and 9). 
When different SBSEC species were compared, a number 
of GIs were only partially conserved (e.g. S, gallolyticus 
UCN34 GIs 1, 3, 6, 7, 8 and 9). Unique GIs were also 
present in most genomes analyzed (e.g. S, pasteurianus 
GIs 2, 4, 6 and 8). Partially conserved GIs may be remnants 
of GIs acquired before speciation events in the SBSEC and 
their subsequent gene decay may be the result of adaptation 
to diverged ecological niches. The existence of unique GIs 
among the SBSEC species, whose acquisition must have 
been more recent (i.e. most probably after speciation), also 
points to the same direction. Furthermore, our analysis 
suggests that S, macedonicus shares stretches of GI 
sequences exclusively with S, infantarius among the SBSEC 
members (e.g. in S, macedonicus GIs 1, 4, 5, 6, 7, 8 and 14) 
in accordance with previous findings [20]. Potential donors 
of GI sequences were identified from best BLASTN hits 
showing sequence identity > 90%. In several instances 
sequence segments within S, macedonicus GIs may have 
derived from more than one donor (Additional file 6: 
Figure S3). Potential donors of the S, macedonicus GIs 
were Streptococcus agalactiae, Streptococcus intermedius, 



Table 3 Genes in the Streptococcus bovis/ Streptococcus equinus complex potentially involved in proteolysis of milk proteins 



Function Gene S. gallolyticus S. gallolyticus S. gallolyticus S. pasteurianus S. macedonicus S. infantarius 

UCN34 ATCC 43143 ATCC BAA-2069 ATCC 43144 ACA-DC 198 CJ18 



Lactocepin 


prtS 


GALLO_ 


_0748 


SGGB_ 


_0730 


SGGBAA2069_ 


_c07210 










Sinf_0588 


Oligopeptide ABC transporter, substrate-binding protein 


oppA 


GALLO_ 


_0324 


SGGB_ 


_0352 


SGGBAA2069_ 


_c03120 


SGPB_ 


_0276 


SMA_ 


_0353 


Sinf_0305 






GALLO_ 


_1412 


SGGB_ 


J 406 


SGGBAA2069_ 


_c 14340 


SGPB_ 


J 328 


SMA_ 


J 347 


Sinf_1225 






GALLO_ 


_1413 


SGGB_ 


J 407 


SGGBAA2069_ 


_c 14350 










Sinf_1226 
Sinf_1825 


Oligopeptide ABC transporter, permease protein 


oppB 


GALLO_ 


_0325 


SGGB_ 


_0353 


SGGBAA2069_ 


_c03130 


SGPB_ 


_0277 


SMA_ 


_0354 


Sinf_0306 
Sinf_1824 


Oligopeptide ABC transporter, permease protein 


oppC 


GALLO_ 


_0326 


SGGB_ 


_0354 


SGGBAA2069_ 


_c03140 


SGPB_ 


_0278 


SMA_ 


_0355 


Sinf_0307 
Sinf_1823 


Oligopeptide ABC transporter, ATP-binding protein 


oppD 


GALLO_ 


_0327 


SGGB_ 


_0355 


SGGBAA2069_ 


_c03150 


SGPB_ 


_0279 


SMA_ 


_0356 


Sinf_0308 
Sinf_1822 


Oligopeptide ABC transporter, ATP-binding protein 


oppF 


GALLO_ 


_0328 


SGGB_ 


_0356 


SGGBAA2069_ 


_c03160 


SGPB_ 


_0280 


SMA_ 


_0357 


Sinf_0309 
Sinf_1821 


Dipeptide/tripeptide permease 


dtpT 


GALLO_ 


_0638 


SGGB_ 


_0613 


SGGBAA2069_ 


_c05810 


SGPB_ 


_0507 


SMA_ 


_0600 


Sinf_0519 


Cysteine aminopeptidase C 


pepC 


GALLO_ 


_0478 


SGGB_ 


_0452 


SGGBAA2069_ 


_c04140 


SGPB_ 


_0379 


SMA_ 


_0442 


Sinf_0388 


Aminopeptidase N 


pepN 


GALLO_ 


J 143 


SGGB_ 


J 134 


SGGBAA2069_ 


_cll310 


SGPB_ 


J 002 


SMA_ 


J 066 


Sinf_0984 


Methionine aminopeptidase 


pepM 


GALLO_ 


_0775 


SGGB_ 


_0758 


SGGBAA2069_ 


_c07470 


SGPB_ 


_0642 


SMA_ 


_0713 


Sinf_0604 


Glutamyl aminopeptidase 


pepA 


GALLO_ 


_0101 


SGGB_ 


_0101 


SGGBAA2069_ 


_c01190 


SGPB_ 


_0100 


SMA_ 


_0113 


SinLOl 1 1 






GALLO_ 


_0151 


SGGB_ 


_0195 


SGGBAA2069_ 


_c01680 


SGPB_ 


_0141 








Endopeptidase 


pepO 


GALLO_ 


_2172 


SGGB_ 


_2204 


SGGBAA2069_ 


_c21680 


SGPB_ 


J 933 


SMA_ 


_2096 


Sinf_1874 


Oligoendopeptidase 


pepF 


GALLO_ 


_0669 


SGGB_ 


_0651 


SGGBAA2069_ 


_c06210 


SGPB_ 


_0551 


SMA_ 


_0630 


Sinf_0554 






GALLO_ 


_1516 


SGGB_ 


_1511 


SGGBAA2069_ 


_c 15390 


SGPB_ 


_1410 


SMA_ 


J 526 


Sinf_1335 


Dipeptidase 


pepD 


GALLO_ 


_0732 


SGGB_ 


_0713 


SGGBAA2069_ 


_c06950 


SGPB_ 


_0605 


SMA_ 


_0668 


Sinf_1301 


Xaa-His dipeptidase 


pepV 


GALLO_ 


_0931 


SGGB_ 


_0915 


SGGBAA2069_ 


_c09050 


SGPB_ 


_0797 


SMA_ 


_0836 


Sinf_0699 


Peptidase T 


pepT 


GALLO_ 


J 366 


SGGB_ 


J 360 


SGGBAA2069_ 


_c 13560 


SGPB_ 


J 287 


SMA_ 


J 297 


Sinf_1183 


X-prolyl-dipeptidyl aminopeptidase 


pepX 


GALLO_ 


J 959 


SGGB_ 


J 942 


SGGBAA2069_ 


_c 19090 


SGPB_ 


J 791 


SMA_ 


J 862 


Sinf_1676 


Aminopeptidase P 


pepP 


GALLO_ 


J 901 


SGGB_ 


J 885 


SGGBAA2069_ 


_cl8550 


SGPB_ 


J 732 


SMA_ 


_1811 


Sinf_1626 


Xaa-proline dipeptidase 


pepQ 


GALLO_ 


J 583 


SGGB_ 


J 582 


SGGBAA2069_ 


_cl6110 


SGPB_ 


J 466 


SMA_ 


J 589 


Sinf_1424 
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Table 4 Genes in the Streptococcus bovis I Streptococcus equinus complex potentially involved in the biosynthesis 
of vitamins 



Vitamin Gene S. gallolyticus S. gallolyticus S. gallolyticus S. pasteurianus S. macedonicus S. infantarius 

UCN34 ATCC 43143 ATCC BAA-2069 ATCC 43144 ACA-DC 198 CJ18 



Biotin (B8, partial) 


bioB 


GALLO_ 


_1916 


SGGB_ 


J 900 


bGuDAAzUoy, 


_cl 8670 


bGrD_ 1 /4b 


- (a) 






bioD 


A 1 1 


_1 91 5 




_1 899 


Qr'r'Q A Aon^n 
bUbDAAzUoy, 


_C 1 ODOU 


C DD 1 "7/1 A 

bLirD_ 1 /44 








bioY 


GALLO_ 


_1 914 


SGGB_ 


_1 898 


Qr'r'Q A ADPi^n 
buGDAAzUoy, 


_cl 8650 


C/^ DD 1 "7/1 Q 

bGrD_ 1 /4b 








pdxS 


GALLO_ 


J 1 89 


SGGB_ 


J 1 83 


bGuDAAzUoy, 


_cl 1 790 




blVlA_l lUb (Sj 

Q\h A 11 
bMA_ 1 1 (Jb (pj 


bint_ 1 Uzz 




pdxT 


GALLO_ 


JIBS 


SGGB_ 


J 182 


SGGBAA2069_ 


_cll780 




SMA_1104 


Sinf_1021 




pdxR 


GALLO_ 


Jill 


SGGB_ 


JlOl 


SGGBAA2069_ 


_c 10980 


SGPB_0968 


SMA_1031 


Sinf_0955 


Folic acid (89) 


folC 


GALLO_ 


J 233 


SGGB_ 


J 227 


SGGBAA2069_ 


_c 12240 


SGPB_1087 


SMA_1137 


Sinf_1067 




TOIL 


UALLU_ 


_ 1 ZdZ 




_ \ ZZo 


Qr'r'QA Aon^n 
bUUDAAzUDy_ 


_C 1 zzoU 


cr"DD 1 r\Qr, 
bUrD_ 1 UoD 


Q\^ A 11 Q^; 
b)VIA_ 1 1 bb 


Cii^f 1 c\r.r. 
bint_ 1 Ubb 




lOIr 


r" A 1 1 
UALLU_ 


_ \ Zd\ 




1 lie: 
_ \ ZZd 


Qr'r'QA Aon^n 
bUUDAAzUDy_ 


^1 DDDr\ 

_C \ ZZZU 


bUrD_ 1 Do J 


Q\^ A IIQc: 
b)VIA_ 1 1 DD 


bint_ 1 Ubb 




folB 


GALLO_ 


J 230 


SGGB_ 


J 224 


SGGBAA2069_ 


_cl2210 


SGPB_1084 


SMA_1134 


Sinf_1064 




I0ll\ 


r" A 1 1 
UALLU_ 


_ \ zz^ 




_ 1 ZZd 


Qr'r'QA ADr\r.(D 
bUUDAAzUDy_ 


_C 1 zzUU 


bUrD_ 1 Dob 


Q\^ A 11QQ 

blVIA_ 1 1 bb 


bint_ 1 Ubb 




lOlU 


r" A 1 1 
UALLU_ 


JJoZZ 


QrT'Q 


_0594 


QrrQ A A Dr\r.(D 
oLibDAAzUoy_ 


^r\c:r,Dr\ 
_CUjDzU 


bur D_U4y4 


Q\^ A ncTQi 
b)VIA_Ujo 1 


bint_UbUb 


Nicotine ainiae (inau, dd) 


nodA 


A 1 1 


_1 937 


QrT'Q 


_1 920 


Qr' r'Q A A Dr\r.Ci 
bubDAAzUDy_ 


_c 1 ooyu 


cr~ DD 1 "7^n 
bLirD_ 1 /by 


Q\t\ A 10/1/1 (v-\ 

b)VIA_ 1 o44 (pj 


bint_ 1 bbb 




nodB 


r" A 1 1 
(aALLU_ 


_ 1 yob 


Qr'r'Q 


_ 1 y 1 y 


QrrQA Aon^n 
DUUDAAzUoy, 


_C 1 oooU 


Cr"DD 1 "7^^Q 

bUrD_ 1 /bo 


Q\^A 1 Q/in /'r^ 
b)VIA_ 1 o4U (Sj 

_)IVIA_ 1 1 

b)VIA_ 1 o4z (Sj 
blVlA_ 1 o4b (pj 


bint_ 1 bb4 




node 


GALLO_ 


J 935 


SGGB_ 


_1918 


bGGDAAzUDy_ 


_cl 8870 


r/- DD 1 "7^"7 

bGrD_ 1 /6/ 


blvlA_ 1 oby 


bint_l bbb 




nodE 


A 1 1 


_0477 


Qr'r'Q 


_0451 


Qr'r'QA Aon^n 
buLiDAAzUDy_ 


_c041 30 


r/~DD nQ"7"7 /'^^^ 

bGrD_(Jb// (pj 
bGrD_(Jb/o (pj 


r ^ /I A n/l /1 1 
blVIA_U44 1 


bint_Ubo/ 


rantotnenate [dd) 


ponB 


A 1 1 


_01 61 


Qr'r'Q 


_0205 


Qr'r'QA A on^n 
bbbDAAzUoy_ 


_c01 790 










ponC 


A 1 1 


_01 60 


Qr'r'Q 


_0204 


QrT'Q A ADr\r.Ci 
bubDAAzUDy_ 


_c01 780 






ciiof ni "7Q 
bint_U 1 ID (tj 




ponD 


GALLO_ 


_01 59 


SGGB_ 


_0203 


Qr'r'QA ADPi^n 
bGGDAAz(JDy_ 


_c01 770 






bint_U 1 / Z 




panE 


GALLO_ 


_0232 


SGGB_ 


_0274 


bGuDAAzUDy, 


_c02470 


DD r\") 1 "7 

bGrD_Uz 1 / 


blVlA_Uzb4 


bint_Uzbb (pj 


niDotiavin [dZ) 


flbD 


A 1 1 


_0692 


Qr'r'Q 


_0673 


Qr'r'QA Aon^n 
bUbDAAzUoy, 


_CUD4y(J 


Cr~DD C\^r^~l 

bGrD_(Jbb/ 




bint_Ub/z 




ribE 


GALLO 


0693 


SGGB 


0674 


Dy3\JDr\r\/.\J\jy_ 


_LUU JUU 


Oor D U JUO 




oil II \jD I D 




ribA 


GALLO_ 


_0694 


SGGB_ 


_0675 


SGGBAA2069_ 


_c06510 


SGPB_0569 




Sinf_0574 




ribH 


GALLO_ 


_0695 


SGGB_ 


_0676 


SGGBAA2069_ 


_c06520 


SGPB_0570 




Sinf_0575 




ribF 


GALLO_ 


J 160 


SGGB_ 


J 152 


SGGBAA2069_ 


_c 11480 


SGPB_1019 


SMA_1086 


Sinf_0999 


Thiamine (Bl, partial) 


tenA 


GALLO_ 


_1181 


SGGB_ 


J 175 


SGGBAA2069_ 


_cll710 


SGPB_1039 




Sinf_1014 




thiE 


GALLO_ 


J 178 


SGGB_ 


J 172 


SGGBAA2069_ 


_c 11680 


SGPB_1036 


SMA_1100 (t) 


SinLlOll 




thiM 


GALLO_ 


J 179 


SGGB_ 


J 173 


SGGBAA2069_ 


_c 11690 


SGPB_1037 




Sinf_1012 




thiD 


GALLO_ 


J 180 


SGGB_ 


J 174 


SGGBAA2069_ 


_cll700 


SGPB_1038 




Sinf_1013 




thil 


GALLO_ 


J 346 


SGGB_ 


J 341 


SGGBAA2069_ 


_cl3350 


SGPB_1268 


SMA_1273 


Sinf_1163 




thiN 


GALLO_ 


_2003 


SGGB_ 


J 987 


SGGBAA2069_ 


_cl9580 


SGPB_1830 


SMA_1899 


Sinf_1712 



(a) Not found; (s) Split CDSs corresponding to fragments of the original gene not yet characterized as pseudogenes; (p) Pseudogenes; (t) Truncated. 



Streptococcus suis, Streptococcus uteris, Enterococcus fae- 
ciuniy Lactococcus garvieae and Pediococcus pentosaceus. 
Most importantly, Lactococcus lactis or S, thermophilus 
were found among these donors in 9 out of 14 S, macedo- 
nicus GIs and the same applies for S, infantarius in 6 out 
of 12 GIs. None of the GI sequences of the other SBSEC 



members could be linked to L lactis or S, thermophilus 
apart from the S, gallolyticus ATCC BAA-2069 GI 6 that 
exhibited a 96% identity over an approximately 3 kb gen- 
omic region of S, thermophilus JIM 8232 (data not 
shown). These observations constitute additional evidence 
that S, macedonicus and S, infantarius are the only 
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members of the complex that have extensively interacted 
with the dairy L lactis and S, thermophilus. 

We then calculated the unique genes (also referred 
here as singleton genes) of S, macedonicus against the 
other SBSEC species twice, taking or not into account 
the genome of S, infantarius. Results from singleton gene 
analysis using EDGAR [23] were manually curated to 
relieve the set from the high numbers of transposable 
elements. There was an important overlap between the 
list of genes found in GIs of S. macedonicus and the 
singleton genes (Additional file 7: Table S4 and Additional 
file 8: Table S5). Again, S. macedonicus and S. infantarius 
were found to share a number of genes that are ab- 
sent from the other SBSEC genomes (Additional file 8: 
Table S5). 

According to the aforementioned analysis S, macedonicus 
carries the complete biosynthetic pathways for two lantibio- 
tic bacteriocins, i.e. the macedocin and the macedovicin 
peptides [37,38]. The presence of both antimicrobials can 
provide an additional link between S, macedonicus and 
the milk environment. Production of macedocin has been 
observed only in milk up to now and proteolytic fragments 
of casein may trigger biosynthesis of this peptide [39]. In 
addition, the entire macedovicin gene cluster is practically 
identical (99% sequence identity over the entire length of 
the -9.8 kb cluster) to the respective clusters of thermo- 
philin 1277 and bovicin HJ50 found in the dairy isolates S, 
thermophilus SET 1277 and S, bovis HJ50, respectively 
[37]. The locus seems to have spread among the three 
strains by HGT and their common dairy origin increases 
the possibility that this exchange of genetic material has 
taken place in milk [37]. 

Another evident characteristic of the S, macedonicus 
genome was the presence of multiple restriction modifica- 
tion (RM) systems among the singleton genes (Additional 
file 9: Figure S4). Streptococcus macedonicus possesses the 
highest number of RM systems within the SBSEC and it is 
the only member of the group with all three types of RM 
systems. A yet unresolved difference in the number and 
the type of RM systems between commensal and dairy 
LAB has been previously observed [40,41]. As mentioned 
earlier, phages are present in milk and dairy products 
often in high numbers [42] and traditional practices (e.g. 
backslopping) may promote the selection of phage resistant 
strains [40,41]. In S, thermophilus RM systems are consid- 
ered as important technological traits [8] and it has been 
previously suggested that genes of the type III RM system 
may provide a signature for milk adaptation [40]. Strepto- 
coccus macedonicus has two type III RM systems, one of 
which is inactive since it consists of pseudogenes. The 
increased number of RM systems of S, macedonicus 
compared to the other SBSEC members suggests that it 
should be particularly competent in resisting invading 
DNA. These findings coincide with the fact that S. 



macedonicus carries the highest number of spacers in its 
CRISPR (clustered regularly interspaced short palindromic 
repeats) locus within the SBSEC (Additional file 10: 
Table S6). Furthermore, BLASTN analysis of the spacers 
in the S, macedonicus CRISPR revealed that four of them, 
namely spacers 3, 5, 17 and 18, had hits in S, thermophilus 
phages (e.g. phages O1205, 7201, Abc2, etc.), S, thermophi- 
lus plasmids (e.g. pER36) or S. thermophilus CRISPR spacer 
sequences (data not shown). In contrast, among the 140 
spacers of the different CRISPR found in the other 
SBSEC species, only one had a hit in L, lactis phage 
1706 (spacer 35 in the CRISPR of S. pasteurianus). 
According to these findings the occurrence of S, macedo- 
nicus in the same habitat as that of S, thermophilus can be 
supported. 

In addition, S, macedonicus contains singleton genes - 
several copies in some instances - coding for proteins in- 
volved in the transport and homeostasis of metal ions 
(Additional file 7: Table S4 and Additional file 8: Table S5). 
Some of these genes are also shared by S, infantarius, 
but not all. These genes may play a role in the transport 
of copper (e.g. copA and copB), cadmium (e.g. cadA and 
cadC)y manganese (e.g. mntH) and magnesium (e.g. 
SMA_2044). Copper and cadmium are of no evident 
biological role for Lactobacillales [43] and thus trans- 
port systems for such metals in S, macedonicus should 
be perceived as a protective mechanism towards their 
deleterious effects (e.g. through oxidative stress). The 
presence of metal transport genes has been previously 
reported in several LAB including L, lactis and S, thermo- 
philus strains [43-48]. In our opinion the high number of 
metal transport associated genes in S, macedonicus was an 
unexpected observation and further investigation is re- 
quired regarding their physiological relevance. 

Distribution of virulence factors (VFs) within species of 
the SBSEC 

One of the main goals behind the genome sequencing of 
S, macedonicus was to clarif)^ its pathogenic potential. 
Unfortunately, despite the well-known association of S, 
bovis with human disease, especially endocarditis and 
colon cancer, there is very little knowledge about the 
pathogenicity mechanisms employed by members of the 
SBSEC. In Table 5 we have gathered genes previously 
assigned as potential VFs in SBSEC. The available studies 
have shed some light on the ability of S, gallolyticus to 
colonize host tissues, a step that is considered as a 
prerequisite for the initiation of the infection by this 
bacterium. Streptococcus gallolyticus UCN34 contains 
three pilus gene clusters which may mediate binding to 
the extracellular matrix (ECM), similarly to the clinical 
isolate TX20005 whose genome is partially characterized 
[25,49]. The pill and pil3 of strain UCN34 have been 
found identical to the acb-sbs7-srtCl and sbslS-sbsM- 
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Table 5 Genes in the Streptococcus bovis I Streptococcus equinus complex identified as putative virulence factors 



Virulence factor 


Gene 


S. gallolyticus 
UCN34 


S. gallolyticus 
ATCC 43143 


S. gallolyticus 
ATCC BAA-2069 


S. pasteurianus 
ATCC 43144 


S. macedonicus 
ACA-DC 198 


S. infantarius 
CJ18 


Pilus 1 (pill) 


acb 


GALLO_2179 


SGGB_2211 


SGGBAA2069_c21760 


-(a) 


- 


- 




sbs7 


GALLO_2178 


SGGB_2210 


SGGBAA2069_c21750 


SGPB_1938 (p) 


- 


- 




srtCl 


GALLO_2177 


SGGB_2209 


SGGBAA2069_c21740 


- 


- 


Sinf_1876 


Dill ic 1 (r^i\l\ 

rllUS z ipilzj 




r Al 1 n 1 f^Tn 
UALLU_ 1 joy 
GALLO_1568 


Qrrp. 1 '^fsQ. 

j>yjyjD_ \ jDO 
OUUD_ 1 DO/ 

SGGB_1566 


joUDAAzuDy_c 1 jyou 
ouuDAAzUDy_c 1 jyju 
SGGBAA2069_cl5940 








Pilus 3 (pil3) 


sbsl5 


GALLO_2040 


SGGB_2022 


SGGBAA2069_cl9980 


SGPB_1847 


SMA_1939 


Sinf_1744 




chic 7 /I 
SDSlH 




oUUd_zUz I 


oLDijDAAzUDy_c 1 yy/u 


Cr"DD 1 QAr. 

oLDrD_ 1 o4d 


o)VIA_ 1 yoo 


Cir-if 1 7/1 ^ 

oinT_ 1 /43 




SiILj 


LiALLU_zUoo 


D\3\3D_Z{JZ{J 


QrrQA Mc\(^Oi r-iQQ/i;n 

ouuDAAzUDy_c 1 yyou 


oUrD_ 1 o4j 


QK/IA 1 QQ7 


Qinf 1 7/1 1 

oinT_ 1 /4z 


Cell envelope 
proteinase (lactocepin) 


SDSO 


r A\ \ 07/1 Q 


QrrR n7^n 

jUUD_U/ jU 


QCCQA Air\f.Ci rC\ll^^\ 

jLDUDAAZUDy_CU/Z 1 U 






oinT_ujoo 


Fructan hydrolase 


sbslO 


GALLO_0112 


SGGB_0110 


SGGBAA2069_c01280 








Collagen adhesin 


SOS 1 J 


r A\ 1 p) lO'^i 

UALLw_ZUjZ 


_)UUD_ZU I 0 


QCCQA Air\f.O r-IQQin 

joUDAAzuDy_c 1 yy i u 


jUrD_ 1 ojy [p) 
oUrD_ 1 o4U ipj 


oiviA_ 1 yoz (^SJ 

QKA A 1 OQQ /'r^^ 

o)VIA_ 1 yoo ipj 

OlVlA_ 1 yo4 [S) 


Qinf 1 7^7 

oinT_ 1/3/ (^pj 


Collagen adhesin 


sbsl6 


uALLU_Uj// 


Qr~r~D nc:/i/i 


bb(jDAAzUDy_CUj 1 ID 








Cell surface protein antigen 




UALLU_ 1 0/ J 


QrT'Q ni 
OUUD_U 1 j4 


oUUDAAzUDy_C 1 oooU 








C (PAc) 






QrT'Q 1 
bU(jD_ 1 DO/ 


bb(jDAAzUDy_CzUjDU 








Surrface-exposed 
histone-like protein A 


nipA 


r" A 1 1 H/^^/^ 
UALLU_UDoD 


QrT'Q ^\(^^ i 


ouuDAAzUDy_cuj /yu 


oLirD_UjUj 


QKAA nc:Q7 

o)viA_ujy/ 


Cir~if nc: 1 7 

oinT_Uj 1 / 


Autolysin 


atlA 


GALLO_1368 


SGGB_1362 


SGGBAA2069_cl3580 


SGPB_1289 


SMA_1299 


Sinf_1186 (t) 


Glucan biosynthesis gene 




GALLO_1052 




SGGBAA2069_cl0370 








cluster 




GALLO_1053 


SGGB_1042 


SGGBAA2069_cl0380 










rggA 


GALLO_1054 


SGGB_1043 


SGGBAA2069_cl0390 


- 


- 


Sinf_0876 




gtfA 


GALLO_1055 


SGGB_1044 


SGGBAA2069_cl0400 


- 


- 


Sinf_0877 




rggB 


GALLO_1056 


SGGB_1045 


SGGBAA2069_cl0410 










gtfB 


GALLO_1057 


SGGB_1046 


SGGBAA2069_cl0420 










sbs2/gbpC 


GALLO_1058 


SGGB_1047 


SGGBAA2069_cl0430 




SMA_0989 (p) 
SMA_0990 (s) 
SMA_0991 (s) 




Hemicellulose biosynthesis 




GALLO_0364 


SGGB_0392 


SGGBAA2069_c03530 




SMA_0392 (p) 


Sinf_0344 


gene cluster 




GALLO_0365 


SGGB_0393 (p) 
SGGB_0394 (p) 


SGGBAA2069_c03540 (s) 
SGGBAA2069_c03550 (s) 




SMA_0393 (p) 








GALLO_0366 


SGGB_0395 


SGGBAA2069_c03560 




SMA_0394 (p) 


Sinf_0345 



GALLO_0367 
GALLO_0630 
GALLO_1262 
GALL0_1 799 



SGGB_0396 
SGGB_0605 
SGGB_1256 
SGGB_1 786 



SGGBAA2069_c03570 
SGGBAA2069_c05730 
SGGBAA2069_cl2530 
SGGBAA2069_cl7570 



SGPB_0499 
SGPB_1172 
SGPB_1603 



SMA_0591 
SMA_1191 
SMA_1 706 



Hemolysin TLY 
Hemolysin III 

Hemolysin A family protein 

Exfoliative toxin B - - - - - - 

Macrophage infectivity _ _ _ _ _ _ 

potentiator protein 

(a) Not found; (p) Pseudogenes; (s) Split CDSs corresponding to fragments of the original gene not yet characterized as pseudogenes; (t) Truncated. 



Sinf_0346 (s) 

Sinf_0511 
Sinf_1093 
Sinf_1530 
Sinf_0933 
Sinf_0931 
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srtC3 loci of strain TX20005, respectively, but their 
additional predicted pilus gene cluster (i.e. pil2 vs. sbsl2' 
sbsll'SrtC2) was only distantly related [25]. While all 
three strains of S, gallolyticus carry the three pilus loci (as 
found in strain UCN34), S, macedonicus, S. pasteurianus 
and S, infantarius carry only the pil3 locus. Functional 
analysis indicated that pill is a crucial factor of S, gallolyti- 
cus UCN34 for binding to ECM, especially to collagen 
[18]. The preference of S, gallolyticus to bind to collagen 
is of particular importance, since it may allow the adher- 
ence of the bacterium to the collagen-rich surfaces of 
damaged heart valves and (pre)cancerous sites [50]. Be- 
sides the pilus loci, additional MSCRAMM (microbial 
surface recognizing adhesive matrix molecules) proteins 
have been predicted in S, gallolyticus, most of which are 
either absent or preudogenes in S, macedonicus, S. pasteur- 
ianus and S, infantarius (Table 5) [49]. The cell surface 
protein antigen c (PAc) also appears exclusively in the S, 
gallolyticus genomes, sometimes in more than one copy. 
Only the surface-exposed histone-like protein A (HlpA) 
and the autolysin (AtlA) are universally conserved in the 
SBSEC. HlpA has been shown to be a major heparin- 
binding protein regulating the ability of S, gallolyticus 
adherence to the heparan sulfate proteoglycans at the colon 
tumor cell surface [51]. AtlA is a fibronectin-binding pro- 
tein which is a VF of S, mutans associated with infective 
endocarditis [52]. Furthermore, S, gallolyticus UCN34 
carries loci for the biosynthesis of insoluble glucan 
polymers from sucrose and the synthesis of hemicellulose 
[25]. Insoluble glucan polymers may contribute to feedlot 
bloat in cattle [25], while hemicellulose could play a role 
in biofilm formation [53]. It is possible that the production 
of these polymers may vary among strains of S, gallolyticus 
(Table 5). Streptococcus macedonicus is devoid of the bio- 
synthetic gene cluster of glucan, while the hemicellulose 
synthesis operon seems to be comprised of pseudogenes. 
Similarly, S, pasteurianus and S, infantarius seem to be also 
unable to synthesize both sugar polymers, either due to full 
or partial absence of the genetic loci. 

More genes whose products may be implicated in 
other interactions with the host cells beyond adherence 
could be identified. Despite the fact that the SBSEC 
members are considered non-hemolytic (as members of 
the group D streptococci), S, gallolyticus ATCC BAA-2069 
has been reported to cause alpha-hemolysis on Schae- 
dler Agar with 5% sheep blood [54]. Three hemolysins 
are conserved among the SBSEC members (Table 5). 
Sequence analysis of Sinf_1513 and Sinf_1683, also an- 
notated as hemolysin genes, was not supportive of a 
hemolysin protein product (data not shown). Apart 
from hemolysins, a putative exfoliative toxin B (Sinf_0933) 
and a macrophage infectivity potentiator protein (Sinf_0931) 
are present in the S, infantarius genome [20]. Similar 
genes can be found in S, thermophilus strains but not in 



the other SBSEC species and in our opinion functional 
analysis is required to verify these annotations. 

In order to expand our investigation for putative patho- 
genicity traits, we screened the genomes of S, macedonicus 
and its related SBSEC species using the VFDB (virulence 
factors database) [55] and the genes determined to encode 
putative VFs during this analysis are presented in Additional 
file 11: Table S7. Current results of comparative pathoge- 
nomics have allowed the classification of available strepto- 
coccal VFs in nine categories, i.e. adhesion factors, DNases, 
exoenzymes, immune evasion factors, immunoreactive 
antigens, factors involved in metal transport, proteases, 
superantigens and toxins [56]. The general profile of 
VFs for the six streptococci under investigation was rather 
similar and we determined a number of previously un- 
identified potential VFs dispersed among all or some of 
the SBSEC members. Several of these genes coding for 
putative VFs like the agglutinin receptor, the fibronectin/ 
fibrinogen-binding protein {fbpS4lpavA), the lipoprotein 
rotamase A {slrA), the plasmin receptor/GAPDH multi- 
functional protein, the streptococcal enolase exoenzyme, 
the pneumococcal surface antigen A and specific proteases 
(i.e. cppA, htrAldegP and tiglropA) have been experimen- 
tally correlated with the virulence of pathogenic strepto- 
cocci beyond SBSEC members [57-67]. Some genes 
were also involved in the production of a capsule that 
enables bacterial cells to evade phagocytosis (Additional 
file 11: Table S7) [68]. According to our analysis, all SBSEC 
streptococci carry a main gene cluster spanning practically 
the same position in the chromosome that could be in- 
volved in the biosynthesis of a capsule (Additional file 12: 
Figure S5). Even though the cps clusters are identical be- 
tween S. gallolyticus UCN34 and ATCC BAA-2069 [54], 
multiple sequence alignment indicates significant struc- 
tural diversity in the rest of the strains. The existence of 
dispersed pseudogenes in the gene clusters of S, infantarius 
and S, macedonicus (e.g. SMA_0865 and SMA_0866) may 
prohibit the production of capsule substances. It should be 
emphasized that the strains of the SBSEC missed hits in 
several major categories of streptococcal VFs (e.g. DNases, 
immunoreactive antigens, superantigens and toxins) sup- 
porting a reduced pathogenic potential for the SBSEC in 
general. 

Conclusions 

In this study we presented the analysis of the first complete 
genome sequence of a dairy isolate of S, macedonicus. 
While comparative analysis among specific subgroups 
of the SBSEC species has been previously presented 
[20,22,25,54], comparative genomics of the six complete 
genome sequences was missing. Most importantly, the 
inclusion of S, macedonicus into this analysis provided a 
better opportunity to assess niche adaptation of the 
SBSEC species that was so far limited by the presence of 
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only one dairy isolate (i.e. S. infantarius CJ18) among four 
clinical strains. 

Our findings clearly support two distinct evolutionary 
patterns within the SBSEC. On the one hand, S, galloly- 
ticus is a species without apparent genome decay and the 
available genomes suggest that it is a robust bacterium 
able to thrive in the rumen of herbivores. On the other 
hand, the remaining SBSEC species, i.e. S, macedonicus, S. 
pasteurianus and S, infantarius exhibit decreased genome 
sizes accompanied by increased percentages of potential 
pseudogenes due to extensive genome decay, suggesting 
adaptation to nutrient-rich environments. This does not 
necessarily mean that the environment to which the three 
species have been adapted is the same. The three species 
appear with a reduced ability to catabolize complex plant 
carbohydrates and to detoxify substances met in the 
rumen, which indicates that they must have deviated from 
this niche. It has been proposed that S, pasteurianus may 
now reside in the human gut [22], while S, infantarius 
presents adaptations to milk [20]. Streptococcus macedoni- 
cus also possesses traits that may contribute to growth in 
the dairy environment, like the extra lactose gene cluster 
and its proteolytic system. However, all SBSEC strains, 
including cUnical isolates, seem to be competent in the 
metabolism of lactose and galactose or the degradation 
of milk proteins. Taking into account these shared charac- 
teristics of all SBSEC species, it is tempting to speculate 
that their common ancestor may have been able to grow 
in milk. 

In our opinion, several genome traits per se suggest 
adaptation of S. macedonicus to milk. This hypothesis is 
also supported by the predicted interspecies interactions 
of S, macedonicus with other bacteria. As it has been 
recently reported for S, infantarius [20], the S, macedonicus 
genome may have acquired genes originating from L lactis 
and S, thermophilus through HGT. The predicted exposure 
of S, macedonicus to S, thermophilus phages, based on our 
CRISPR sequence analysis, is also in favor of this theory. 
No such evidence was found for the rest of the SBSEC 
members apart from S, infantarius. These findings are in 
accordance with the frequent isolation of S. macedonicus 
from dairy products [13] and the prevalence of S. infantar- 
ius in certain African fermented milks [20]. One additional 
question that arises is whether S. macedonicus and S. infan- 
tarius are specialized dairy microbes like S. thermophilus. 
We believe that the available data does not support this 
idea. Traits of milk adaptation have been shown to be 
strain-specific in S, infantarius [20]. In addition, the gen- 
ome size of S, macedonicus is significantly larger, containing 
a higher number of functional genes in comparison to S. 
thermophilus. Streptococcus macedonicus and S, infantarius 
may thus represent intermediate evolutionary stages analo- 
gous to those followed by the ancestors of S, thermophilus 
before it became todays starter culture. 



Thus, the safety concerns raised from the presence of 
SBSEC members in foods remain, even if reports implicat- 
ing S, macedonicus with disease are rather scarce [69,70]. 
Our comparative genomic analysis showed that both S, 
macedonicus and S, infantarius miss several VPs that are 
highly conserved in S, gallolyticus. However, the interpret- 
ation of these findings becomes complicated as the avail- 
able genome of the human blood isolate S, pasteurianus 
ATCC 43144 also exhibited diminished traits of pathogen- 
icity similarly to the two dairy SBSEC members. Overall, 
our analysis provides evidence in agreement with the 
cUnical perception that the members of the SBSEC are 
lower grade streptococcal pathogens [10]. In terms of food 
safety, the dairy SBSEC could thus constitute a risk factor 
similar to the presence of enterococci that are widely 
found in fermented products, but cause no major problem 
for the average healthy and adult consumer. Nevertheless, 
it is the correlation of the SBSEC microorganisms with 
human endocarditis and colon cancer in particular that 
may require special considerations. For example, it has 
been proposed that members of the SBSEC like S, gal- 
lolyticus may be part of the etiology of colon cancer by 
causing chronic inflammation [10]. In order to assess 
the pathogenicity of this group of streptococci, more 
research is needed on the specific mechanisms employed 
by SBSEC members to cause disease. More comparative 
and functional genomics studies comprising SBSEC ge- 
nomes are necessary that will cover additional species 
of the complex, like the recently sequenced Streptococcus 
lutetiensis [71]. New clinico-epidemiological studies should 
also be undertaken in view of the most recent changes in 
the taxonomy of the SBSEC complex [72]. In the mean- 
time, assuming the worse case scenario, we propose that 
the presence of SBSEC members including S, macedonicus 
and S, infantarius in foods should be avoided until their 
pathogenicity status is resolved. 

Methods 

Sequencing and annotation of the genome of Streptococcus 
macedonicus ACA-DC 198 

The genome of S, macedonicus ACA-DC 198 was se- 
quenced and annotated as described previously [19]. In 
brief, we employed a sequencing strategy involving 
shotgun/paired-end pyrosequencing and shotgun lUumina 
sequencing with the 454 GS-FLX (Roche Diagnostics, 
Basel, Switzerland) and the Hiseq 2000 (Illumina, San 
Diego, CA), respectively. Sequences were assembled in two 
contigs corresponding to the complete genome sequence 
and the pSMA198 plasmid of S, macedonicus. The hybrid 
assembly was validated against an Nhel optical map of the 
S, macedonicus genome generated at OpGen Technologies, 
Inc. (Madison, WI). The genome was annotated using the 
RAST [73] and the Basys [74] pipelines. Predictions of the 
two pipelines were compiled into a single annotation file 
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after manual curation in the Kodon software environment 
(Applied Maths N.V., Sint-Martens-Latem, Belgium). Final 
corrections and quality assessment of the annotation were 
performed with the GenePRIMP pipeline [21]. GenePRIMP 
was also used for the identification of putative pseudogenes. 
The circular map of the S. macedonicus genome was gen- 
erated by the DNAPlotter software [75]. 

Comparative genomics of Streptococcus macedonicus 
ACA-DC 198 against related members of the SBSEC 

The complete genome sequence of S, macedonicus was 
compared to those of S, gallolyticus strains UCN34, ATCC 
43143 and ATCC BAA-2069, S. pasteurianus ATCC 43144 
and S, infantarius CJ18 using a variety of tools. In order 
to visualize conserved genomic regions or chromosomal 
rearrangements, whole genome sequence alignments 
were performed by progressiveMAUVE [24]. Estimation 
of the differential gene content of the genomes, as well 
as whole genome phylogeny of streptococci was carried 
out within the EDGAR software framework [23]. Venn 
diagrams were designed with the VennDiagram package 
in R [76]. The glycobiome of the SBSEC members was 
determined based on the pre-computed data available in 
the CAZy database [26]. 

Additional analysis 

Sequence similarity searches were performed with the 
BLAST suite [77]. Whenever necessary, protein sequences 
were analyzed in the CDD [78]. Figures showing similarity 
of gene clusters were constructed with the Easyfig compari- 
son visualizer [79]. Potential VPs included in the VFDB [55] 
were identified in the SBSEC genomes with mpiBLAST, 
as implemented in the mGenomeSubtractor website 
[80]. In brief, the entire VFDB was uploaded as the ref- 
erence sequence in the mGenomeSubtractor website 
and each genome was used as the query sequence. Only 
hits with //-value homology score > 0.6 were considered 
significant. CRISPRs were analyzed by the tools available 
in the CRISPRcompar web-service [81]. A general bit score 
cutoff value of 42.0 was applied during BLASTN of 
CRISPR spacers. GIs were identified and visualized by 
the IslandViewer application that utilizes three differ- 
ent prediction tools (i.e. IslandPick, SIGI-HMM and 
IslandPath-DIMOB) relying on either sequence compos- 
ition or comparative genomics [36]. Genomic regions of 
RM systems were determined in the REBASE genomes 
database [82]. 

Availability of supporting data 

The data set supporting the phylogenetic tree presented 
in Additional file 1: Figure SI of this article is available 
in the [Dryad] repository, [unique persistent identifier 
doi: 10.5061 /dryad.7d039 and hyperlink to datasets in 
http://datadryad.org/]. Additional data sets supporting 



the results of this article are included within the article 
and its additional files. 



Additional files 



Additional file 1: Figure SI. Whole genome phylogeny of the 
Streptococcus genus. The phylogenetic tree was constructed using the 
EDGAR tool based on complete genome sequences of streptococci. The 
branch of the members of the Streptococcus bovis/ Streptococcus equinus 
complex (SBSEC) is delimited by a bracket. 

Additional file 2: Table SI. Core genome analysis among Streptococcus 
gallolyticus UCN34, Streptococcus gallolyticus ATCC 43143, Streptococcus 
gallolyticus ATCC BAA-2069, Streptococcus macedonicus ACA-DC 1 98 and 
Streptococcus pasteurianus ATCC 43144 calculated using the EDGAR 
software. 

Additional file 3: Table S2. Core genome analysis among 
Streptococcus gallolyticus ATCC 43143, Streptococcus infantarius CJ18, 
Streptococcus macedonicus ACA-DC 1 98 and Streptococcus pasteurianus 
ATCC 43144 calculated using the EDGAR software. In this analysis S. 
gallolyticus ATCC 43143 was selected as a representative of the S. 
gallolyticus species, since it has the longest genome size among the 
three sequenced strains. 

Additional file 4: Table S3. Glycobiome analysis of Streptococcus 
gallolyticus UCN34, Streptococcus gallolyticus ATCC 43143, Streptococcus 
gallolyticus ATCC BAA-2069, Streptococcus pasteurianus ATCC 43144, 
Streptococcus macedonicus ACA-DC 198 and Streptococcus infantarius 
CJ18 using the CAZy database. 

Additional file 5: Figure S2. Circular maps of the Streptococcus bovis/ 
Streptococcus equinus complex genomes highlighting the regions 
corresponding to genomic islands (GIs). GIs are coloured within the 
circular maps according to the tool that predicted each one of them: 
green, orange and blue were predicted with IslandPick, SIGI-HMM and 
IslandPath-DIMOB, respectively. The integrated GIs are presented at 
the periphery of the map in red colour. The black line plot represents 
the GC content (%) of the genomic sequences. Numbering of the GIs 
for each genome starts from the first Gl found after position 0 of the 
genome in a clockwise direction. 

Additional file 6: Figure S3. Analysis of the genomic island (Gl) 4 of 
Streptococcus macedonicus ACA-DC 198 presented as an example of a Gl 
potentially originating from multiple donors. In the graphical summary of the 
BLASTN results arrows indicate the best BLASTN hits with > 90% sequence 
identity corresponding to: a. Streptococcus thermophilus MN-ZLW-002 genomic 
sequence (96% sequence identity); b. Lactococcus garvieae 21881 plasmid 
pGL3 sequence (98% sequence identity); c. Streptococcus intermedius B196 
genomic sequence (96% sequence identity); d. Streptococcus thermophilus 
MN-ZLW-002 genomic sequence (99% sequence identity) and e. Streptococcus 
thermophilus MN-ZLW-002 genomic sequence (99% sequence identity). 

Additional file 7: Table S4. Genes within each integrated Gl of 
Streptococcus macedonicus ACA-DC 198 as determined by IslandViewer. 

Additional file 8: Table S5. The singleton genes of Streptococcus 
macedonicus ACA-DC 198 calculated against the other members of the 
Streptococcus bovis/ Streptococcus equinus complex (SBSEC) using the 
EDGAR software. The singleton genes of S. macedonicus were calculated 
twice, taking or not into account the genome of S. infantarius. Thus, 
genes shared only by S. macedonicus and 5. infantarius among the SBSEC 
members also appear in the table. 

Additional file 9: Figure S4. Circular maps of the Streptococcus 
bovis/ Streptococcus equinus complex genomes highlighting the regions 
corresponding to restriction modification systems (RMs). RMs are 
presented as predicted in the REBASE database. Colours and symbols are 
exemplified at the bottom of the figure. 

Additional file 10: Table S6. Comparison of the CRISPR/Cas systems 
among members of the Streptococcus bovis/Streptococcus equinus 
complex using CRISPRcompar. 
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Additional file 11: Table S7. Genes in the Streptococcus bovis/ 
Streptococcus equinus complex identified as virulence factors within 
the VFDB. 

Additional file 12: Figure S5. Multiple sequence alignment of the 
capsule biosynthetic gene cluster found in the genomes of the 
Streptococcus bovis/ Streptococcus equinus complex after BLASTN analysis. 
Grey shading represents the % identity among the nucleotide sequences 
according to the colour gradient presented at the lower right corner of 
the figure. Potential pseudogenes are marked with a "p". 
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